Automatic phonetic base form generation based on maximum context tree
نویسنده
چکیده
To improve the performance and the usability of the speech recognition devices, it is necessary for most applications to allow users to enter new words or personalize words in the system vocabulary. The voicetagging technique is a simple example of using speaker dependent spoken samples to generate baseform transcriptions of the spoken words. More sophisticated techniques can use both spoken samples and text versions of the new words to generate baseform transcriptions. In this paper, we propose a maximum context tree (MCT) based approach to the problem. Comparison is made to the common decision tree based method and Pronunciation by Analogy (PbA) approach. The new approach gives exact baseform transcription for in-vocabulary words and it shows better performance than decision tree. It performs significantly better than PbA approach with less memory usages. MCT uses the word segment probability rather than frequency count used in PbA. MCT uses the full context for the focus letter to overcome the some deficiencies in the PbA approach.
منابع مشابه
Automatic generation of non-uniform context-dependent HMM topologies based on the MDL criterion
We propose a new method of automatically creating nonuniform context-dependent HMM topologies by using the Minimum Description Length (MDL) criterion. Phonetic decision tree clustering is widely used, based on the Maximum Likelihood (ML) criterion, and creates only contextual variations. However, it also needs to empirically predetermine control parameters for use as stop criteria, for example,...
متن کاملAutomatic question generation for decision tree based state tying
Decision tree based state tying uses so-called phonetic questions to assign triphone states to reasonable acoustic models. These phonetic questions are in fact phonetic categories such as vowels, plosives or fricatives. The assumption behind this is that context phonemes which belong to the same phonetic class have a similar influence on the pronunciation of a phoneme. For a new phoneme set, wh...
متن کاملPhonetic Question Generation Using Misrecognition
Most automatic speech recognition systems are currently based on tied state triphones. These tied states are usually determined by a decision tree. Decision trees can automatically cluster triphone states into many classes according to data available allowing each class to be trained efficiently. In order to achieve higher accuracy, this clustering is constrained by manually generated phonetic ...
متن کاملIsip 2000 Conversational Speech Evaluation System
In this paper, we describe the ISIP Automatic Speech Recognition system (ISIP-ASR) used for the Hub-5 2000 English evaluations. The system is a public domain cross-word context-dependent HMM based system and has all the functionality normally expected in an LVCSR system, including Baum-Welch training for continuous density HMMs, phonetic decision tree-based state-tying, word graph generation an...
متن کاملTowards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation
Modern automatic speech recognition systems use Gaussian mixture models (GMM) on acoustic observations to model the probability of producing a given observation under any one of many hidden discrete phonetic states. This paper investigates the feasibility of using an acoustic decision tree to directly model these probabilities. Unlike the more common phonetic decision tree, which asks questions...
متن کامل